Data Mining for Very Busy People

نویسندگان

  • Tim Menzies
  • Ying Hu
چکیده

F or 21st-century businesses, the problem is not accessing data but ignoring irrelevant data. Most modern businesses can electronically access mountains of data such as transactions for the past two years or the state of their assembly line. The trick is effectively using the available data. In practice, this means summarizing large data sets to find the “pearls in the dust”—that is, the data that really matters. In the data mining community, “learning the least” is an uncommon goal. Most data miners are zealous hunters seeking detailed summaries and generating extensive and lengthy descriptions. The “Data Mining and Treatment Learning” sidebar discusses some work in this area. Here, we take a different approach and assume that busy people don’t need— or can’t use—complex models. Rather, they want only the data they need to achieve the most benefits. Instead of finding extensive descriptions of things, the TAR2 “treatment learner” is a data mining tool (http://menzies.us/rx.html) that hunts for a minimal difference set between things. A list of essential differences is easier to read and understand than detailed descriptions. Overly elaborate models can complicate, not clarify, a situation. Cognitive scientists and researchers studying human decision making note that people often use simple models rather than intricate ones. Because it learns smaller models, TAR2 provides better support for realworld decision making than standard data miners.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Data Mining for Very Busy People Tar2: a Simpler, Shorter Rule

F or 21st-century businesses, the problem is not accessing data but ignoring irrelevant data. Most modern businesses can electronically access mountains of data such as transactions for the past two years or the state of their assembly line. The trick is effectively using the available data. In practice, this means summarizing large data sets to find the “pearls in the dust”—that is, the data t...

متن کامل

Diet Recommender System Using Web Data Mining

In this fast paced and busy scheduled life, people very seldom are giving importance to the quality of food they are eating. Fast food consumption is increasing dramatically among the people over the past few years. And this consequently, has lead to unhealthy food habits among the people of all generation. Hence it has become very essential for the people to have a good balanced nutritional he...

متن کامل

0 Challenging Problems in Data Mining Research. Developing a Unifying Theory of Data Mining

In October 2005, we took an initiative to identify 10 challenging problems in data mining research, by consulting some of the most active researchers in data mining and machine learning for their opinions on what are considered important and worthy topics for future research in data mining. We hope their insights will inspire new research efforts, and give young researchers (including PhD stude...

متن کامل

How being busy can increase motivation and reduce task completion time.

This research tests the hypothesis that being busy increases motivation and reduces the time it takes to complete tasks for which people miss a deadline. This effect occurs because busy people tend to perceive that they are using their time effectively, which mitigates the sense of failure people have when they miss a task deadline. Studies 1 and 2 show that when people are busy, they are more ...

متن کامل

Problems in Data Mining Research

In October 2005, we took an initiative to identify 10 challenging problems in data mining research, by consulting some of the most active researchers in data mining and machine learning for their opinions on what are considered important and worthy topics for future research in data mining. We hope their insights will inspire new research efforts, and give young researchers (including PhD stude...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IEEE Computer

دوره 36  شماره 

صفحات  -

تاریخ انتشار 2003